--- title: Tutorial keywords: fastai sidebar: home_sidebar summary: "The goal of this challenge is to find all instances of dolphins in a picture and then color pixes of each dolphin with a unique color." description: "The goal of this challenge is to find all instances of dolphins in a picture and then color pixes of each dolphin with a unique color." nb_path: "notebooks/00_tutorial/DolphinsTutorial.ipynb" ---
{% raw %}
{% endraw %}

Please open this notebook in Colab to edit it and submit a solution:

Open In Colab

{% raw %}
%load_ext autoreload
%autoreload 2
{% endraw %}

We need to change runtime to GPU to speed up training:

"Change runtime"

{% raw %}
err = !nvidia-smi
if "failed" in err[0]:
    raise Exception("Change runtime in menu to GPU (Runtime->Change runtime type->GPU)")
    
!nvidia-smi
Tue Jan  5 12:28:50 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   52C    P2    58W / 275W |   1147MiB / 11177MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
{% endraw %}

Import dolphins_recognition_challenge library if it is already installed or install it if it is not. Please restart the runtime by clicking on the button bellow if it say so.

{% raw %}
try:
    import dolphins_recognition_challenge
except Exception:
    if "google.colab" in str(get_ipython()):
        print("Running on CoLab")
        !pip install dolphins-recognition-challenge
{% endraw %} {% raw %}
import dolphins_recognition_challenge
import numpy as np
import PIL
from PIL import Image

import torch
import torchvision
import pandas as pd
import seaborn as sns
{% endraw %}

Download data

We start by downloading and visualizing the dataset containing 200 photographs with one or more dolphins split into a training set containing 160 photographs and a validation set containing 40 photographs.

{% raw %}
from dolphins_recognition_challenge.datasets import get_dataset, display_batches
    
data_loader, data_loader_test = get_dataset("segmentation", batch_size=3)

display_batches(data_loader, n_batches=2)
{% endraw %}

Data augmentation

In order to prevent overfitting which happens when the dataset size is too small, we perform a number of transformations to increase the size of the dataset. One transofrmation implemented in the Torch vision library is RandomHorizontalFlip and we will implemented MyColorJitter which is basically just a wrapper around torchvision.transforms.ColorJitter class. However, we cannot use this class directly without a wrapper because a transofrmation could possibly affect targets and not just the image. For example, if we were to implement RandomCrop, we would need to crop segmentation masks and readjust bounding boxes as well.

{% raw %}
class MyColorJitter:
    def __init__(self, brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5):
        self.torch_color_jitter = torchvision.transforms.ColorJitter(
            brightness=brightness, contrast=contrast, saturation=saturation, hue=hue
        )

    def __call__(self, image, target):
        image = self.torch_color_jitter(image)
        return image, target
{% endraw %}

We will make a series of transformations on an image and we will combine all those transofrmations in a single one as follows:

{% raw %}
from dolphins_recognition_challenge.datasets import ToTensor, ToPILImage, Compose, RandomHorizontalFlip

def get_tensor_transforms(train):
    transforms = []
    # converts the image, a PIL image, into a PyTorch Tensor
    transforms.append(ToTensor())
    if train:
        # during training, randomly flip the training images
        # and ground-truth for data augmentation
        transforms.append(
            MyColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5)
        )
        transforms.append(RandomHorizontalFlip(0.5))
        # TODO: add additional transforms: e.g. random crop
    return Compose(transforms)
{% endraw %}

With data augementation defined, we are ready to generate the actual datasets used for training our models.

{% raw %}
batch_size = 4

data_loader, data_loader_test = get_dataset(
    "segmentation", get_tensor_transforms=get_tensor_transforms, batch_size=batch_size
)

display_batches(data_loader, n_batches=4)
{% endraw %}

{% include tip.html content='incorporate more transformation classes such as RandomCrop etc. (https://pytorch.org/docs/stable/torchvision/transforms.html)' %}

Model

We can reuse already trained models for instance segmentation trained on other dataset and finetune it for our particular problem, in our case on dataset with dolphins.

{% raw %}
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor

def get_instance_segmentation_model(hidden_layer_size, box_score_thresh=0.5):
    # our dataset has two classes only - background and dolphin    
    num_classes = 2
    
    # load an instance segmentation model pre-trained on COCO
    model = torchvision.models.detection.maskrcnn_resnet50_fpn(
        pretrained=True, 
        box_score_thresh=box_score_thresh, 
    )

    # get the number of input features for the classifier
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # replace the pre-trained head with a new one
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

    # now get the number of input features for the mask classifier
    in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels

    model.roi_heads.mask_predictor = MaskRCNNPredictor(
        in_channels=in_features_mask, 
        dim_reduced=hidden_layer_size,
        num_classes=num_classes
    )

    return model
{% endraw %}

Before using a model constructed, we should move it to appropriate device. We will test if we have GPU available and move it to there if possible.

{% raw %}
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

# get the model using our helper function
model = get_instance_segmentation_model(hidden_layer_size=256)

# move model to the right device
model.to(device)

# construct an optimizer
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005, momentum=0.9, weight_decay=0.0005)

# and a learning rate scheduler which decreases the learning rate by
# 10x every 3 epochs
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)
Downloading: "https://download.pytorch.org/models/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth" to /root/.cache/torch/hub/checkpoints/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth
{% endraw %}

We have implemented a function for training a model for one epoch - meaning using each image from the training dataset exactly once. Let's train for one epochs an see what predictions we make before and after that.

{% raw %}
data_loader, data_loader_test = get_dataset(
    "segmentation",
    batch_size=4,
    get_tensor_transforms=get_tensor_transforms,
    n_samples=8,
)
{% endraw %} {% raw %}
data_loader, data_loader_test = get_dataset(
    "segmentation", get_tensor_transforms=get_tensor_transforms, batch_size=batch_size
)
{% endraw %} {% raw %}
from dolphins_recognition_challenge.instance_segmentation.model import train_one_epoch
from dolphins_recognition_challenge.instance_segmentation.model import show_predictions

show_predictions(model, data_loader=data_loader_test, n=1, score_threshold=0.5)

num_epochs = 1

for epoch in range(num_epochs):
    # train for one epoch, printing every 10 iterations
    train_one_epoch(model, optimizer, data_loader, device, epoch=epoch, print_freq=20)

train_one_epoch(model, optimizer, data_loader, device, epoch=1, print_freq=20)

show_predictions(model, data_loader=data_loader_test, n=1, score_threshold=0.5)
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:3103: UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. 
  warnings.warn("The default behavior for interpolate/upsample with float scale_factor changed "
Epoch: [0]  [ 0/40]  eta: 0:01:10  lr: 0.000133  loss: 5.4089 (5.4089)  loss_classifier: 0.6839 (0.6839)  loss_box_reg: 0.2898 (0.2898)  loss_mask: 4.4265 (4.4265)  loss_objectness: 0.0006 (0.0006)  loss_rpn_box_reg: 0.0080 (0.0080)  time: 1.7720  data: 0.9526  max mem: 4477
Epoch: [0]  [20/40]  eta: 0:00:15  lr: 0.002695  loss: 0.9911 (1.7665)  loss_classifier: 0.2020 (0.2973)  loss_box_reg: 0.2339 (0.2436)  loss_mask: 0.4659 (1.1417)  loss_objectness: 0.0188 (0.0392)  loss_rpn_box_reg: 0.0243 (0.0447)  time: 0.6990  data: 0.0148  max mem: 4757
Epoch: [0]  [39/40]  eta: 0:00:00  lr: 0.005000  loss: 0.6616 (1.2470)  loss_classifier: 0.1207 (0.2161)  loss_box_reg: 0.1964 (0.2248)  loss_mask: 0.2759 (0.7356)  loss_objectness: 0.0159 (0.0301)  loss_rpn_box_reg: 0.0159 (0.0402)  time: 0.6706  data: 0.0113  max mem: 5184
Epoch: [0] Total time: 0:00:28 (0.7132 s / it)
Epoch: [1]  [ 0/40]  eta: 0:01:08  lr: 0.005000  loss: 0.5226 (0.5226)  loss_classifier: 0.0928 (0.0928)  loss_box_reg: 0.1798 (0.1798)  loss_mask: 0.2171 (0.2171)  loss_objectness: 0.0201 (0.0201)  loss_rpn_box_reg: 0.0129 (0.0129)  time: 1.7046  data: 1.0305  max mem: 5184
Epoch: [1]  [20/40]  eta: 0:00:14  lr: 0.005000  loss: 0.6045 (0.6287)  loss_classifier: 0.0911 (0.0946)  loss_box_reg: 0.2004 (0.2065)  loss_mask: 0.2504 (0.2617)  loss_objectness: 0.0102 (0.0149)  loss_rpn_box_reg: 0.0180 (0.0511)  time: 0.6989  data: 0.0141  max mem: 5184
Epoch: [1]  [39/40]  eta: 0:00:00  lr: 0.005000  loss: 0.5161 (0.5776)  loss_classifier: 0.0908 (0.0937)  loss_box_reg: 0.1703 (0.1956)  loss_mask: 0.2039 (0.2432)  loss_objectness: 0.0073 (0.0112)  loss_rpn_box_reg: 0.0138 (0.0339)  time: 0.6817  data: 0.0116  max mem: 5184
Epoch: [1] Total time: 0:00:28 (0.7169 s / it)
{% endraw %}

Now we can fully train the model for more epochs, in this case for 20 more.

{% raw %}
num_epochs = 20

data_loader, data_loader_test = get_dataset(
    "segmentation", batch_size=4, get_tensor_transforms=get_tensor_transforms
)
{% endraw %} {% raw %}
for epoch in range(1, num_epochs):
    # train for one epoch, printing every 10 iterations
    train_one_epoch(model, optimizer, data_loader, device, epoch=epoch, print_freq=20)

    lr_scheduler.step()
Epoch: [1]  [ 0/40]  eta: 0:01:08  lr: 0.005000  loss: 0.5627 (0.5627)  loss_classifier: 0.0863 (0.0863)  loss_box_reg: 0.1434 (0.1434)  loss_mask: 0.2966 (0.2966)  loss_objectness: 0.0206 (0.0206)  loss_rpn_box_reg: 0.0158 (0.0158)  time: 1.7187  data: 0.9897  max mem: 5184
Epoch: [1]  [20/40]  eta: 0:00:14  lr: 0.005000  loss: 0.4207 (0.4558)  loss_classifier: 0.0734 (0.0762)  loss_box_reg: 0.1401 (0.1559)  loss_mask: 0.1883 (0.1992)  loss_objectness: 0.0054 (0.0072)  loss_rpn_box_reg: 0.0100 (0.0174)  time: 0.6977  data: 0.0109  max mem: 5184
Epoch: [1]  [39/40]  eta: 0:00:00  lr: 0.005000  loss: 0.4653 (0.4748)  loss_classifier: 0.0828 (0.0812)  loss_box_reg: 0.1526 (0.1589)  loss_mask: 0.1996 (0.2007)  loss_objectness: 0.0054 (0.0075)  loss_rpn_box_reg: 0.0091 (0.0265)  time: 0.6904  data: 0.0115  max mem: 5186
Epoch: [1] Total time: 0:00:28 (0.7205 s / it)
Epoch: [2]  [ 0/40]  eta: 0:01:10  lr: 0.005000  loss: 0.4450 (0.4450)  loss_classifier: 0.0741 (0.0741)  loss_box_reg: 0.1489 (0.1489)  loss_mask: 0.2085 (0.2085)  loss_objectness: 0.0021 (0.0021)  loss_rpn_box_reg: 0.0113 (0.0113)  time: 1.7695  data: 1.0363  max mem: 5186
Epoch: [2]  [20/40]  eta: 0:00:15  lr: 0.005000  loss: 0.4055 (0.4082)  loss_classifier: 0.0630 (0.0671)  loss_box_reg: 0.1291 (0.1356)  loss_mask: 0.1807 (0.1859)  loss_objectness: 0.0046 (0.0046)  loss_rpn_box_reg: 0.0099 (0.0150)  time: 0.7073  data: 0.0123  max mem: 5186
Epoch: [2]  [39/40]  eta: 0:00:00  lr: 0.005000  loss: 0.4854 (0.4373)  loss_classifier: 0.0803 (0.0732)  loss_box_reg: 0.1555 (0.1480)  loss_mask: 0.1874 (0.1871)  loss_objectness: 0.0059 (0.0073)  loss_rpn_box_reg: 0.0107 (0.0217)  time: 0.7162  data: 0.0141  max mem: 5186
Epoch: [2] Total time: 0:00:29 (0.7400 s / it)
Epoch: [3]  [ 0/40]  eta: 0:01:19  lr: 0.005000  loss: 0.5686 (0.5686)  loss_classifier: 0.0889 (0.0889)  loss_box_reg: 0.1940 (0.1940)  loss_mask: 0.2602 (0.2602)  loss_objectness: 0.0124 (0.0124)  loss_rpn_box_reg: 0.0131 (0.0131)  time: 1.9988  data: 1.2436  max mem: 5186
Epoch: [3]  [20/40]  eta: 0:00:15  lr: 0.005000  loss: 0.3962 (0.4104)  loss_classifier: 0.0686 (0.0733)  loss_box_reg: 0.1476 (0.1482)  loss_mask: 0.1488 (0.1620)  loss_objectness: 0.0038 (0.0049)  loss_rpn_box_reg: 0.0109 (0.0219)  time: 0.7133  data: 0.0108  max mem: 5186
Epoch: [3]  [39/40]  eta: 0:00:00  lr: 0.005000  loss: 0.3773 (0.3975)  loss_classifier: 0.0655 (0.0690)  loss_box_reg: 0.1168 (0.1382)  loss_mask: 0.1673 (0.1681)  loss_objectness: 0.0032 (0.0049)  loss_rpn_box_reg: 0.0061 (0.0172)  time: 0.7110  data: 0.0132  max mem: 5186
Epoch: [3] Total time: 0:00:29 (0.7473 s / it)
Epoch: [4]  [ 0/40]  eta: 0:01:11  lr: 0.005000  loss: 0.3412 (0.3412)  loss_classifier: 0.0437 (0.0437)  loss_box_reg: 0.0895 (0.0895)  loss_mask: 0.1972 (0.1972)  loss_objectness: 0.0066 (0.0066)  loss_rpn_box_reg: 0.0043 (0.0043)  time: 1.7795  data: 1.0204  max mem: 5186
Epoch: [4]  [20/40]  eta: 0:00:15  lr: 0.005000  loss: 0.3451 (0.3825)  loss_classifier: 0.0585 (0.0619)  loss_box_reg: 0.1118 (0.1192)  loss_mask: 0.1646 (0.1738)  loss_objectness: 0.0023 (0.0048)  loss_rpn_box_reg: 0.0069 (0.0228)  time: 0.7097  data: 0.0114  max mem: 5186
Epoch: [4]  [39/40]  eta: 0:00:00  lr: 0.005000  loss: 0.3398 (0.3686)  loss_classifier: 0.0514 (0.0582)  loss_box_reg: 0.0974 (0.1147)  loss_mask: 0.1551 (0.1710)  loss_objectness: 0.0024 (0.0042)  loss_rpn_box_reg: 0.0085 (0.0205)  time: 0.6938  data: 0.0113  max mem: 5186
Epoch: [4] Total time: 0:00:29 (0.7304 s / it)
Epoch: [5]  [ 0/40]  eta: 0:01:17  lr: 0.005000  loss: 0.3025 (0.3025)  loss_classifier: 0.0467 (0.0467)  loss_box_reg: 0.1068 (0.1068)  loss_mask: 0.1398 (0.1398)  loss_objectness: 0.0062 (0.0062)  loss_rpn_box_reg: 0.0029 (0.0029)  time: 1.9265  data: 1.1509  max mem: 5186
Epoch: [5]  [20/40]  eta: 0:00:15  lr: 0.005000  loss: 0.3195 (0.3398)  loss_classifier: 0.0464 (0.0515)  loss_box_reg: 0.0939 (0.1128)  loss_mask: 0.1687 (0.1578)  loss_objectness: 0.0027 (0.0037)  loss_rpn_box_reg: 0.0062 (0.0139)  time: 0.7121  data: 0.0114  max mem: 5187
Epoch: [5]  [39/40]  eta: 0:00:00  lr: 0.005000  loss: 0.3195 (0.3356)  loss_classifier: 0.0390 (0.0490)  loss_box_reg: 0.0923 (0.1052)  loss_mask: 0.1317 (0.1518)  loss_objectness: 0.0024 (0.0035)  loss_rpn_box_reg: 0.0104 (0.0261)  time: 0.6914  data: 0.0104  max mem: 5187
Epoch: [5] Total time: 0:00:29 (0.7333 s / it)
Epoch: [6]  [ 0/40]  eta: 0:01:11  lr: 0.005000  loss: 0.3845 (0.3845)  loss_classifier: 0.0422 (0.0422)  loss_box_reg: 0.0695 (0.0695)  loss_mask: 0.1620 (0.1620)  loss_objectness: 0.0066 (0.0066)  loss_rpn_box_reg: 0.1042 (0.1042)  time: 1.7774  data: 0.9542  max mem: 5187
Epoch: [6]  [20/40]  eta: 0:00:15  lr: 0.005000  loss: 0.3038 (0.3195)  loss_classifier: 0.0401 (0.0457)  loss_box_reg: 0.0961 (0.1021)  loss_mask: 0.1445 (0.1554)  loss_objectness: 0.0017 (0.0029)  loss_rpn_box_reg: 0.0072 (0.0134)  time: 0.7026  data: 0.0095  max mem: 5187
Epoch: [6]  [39/40]  eta: 0:00:00  lr: 0.005000  loss: 0.3011 (0.3176)  loss_classifier: 0.0427 (0.0456)  loss_box_reg: 0.0820 (0.0975)  loss_mask: 0.1412 (0.1513)  loss_objectness: 0.0017 (0.0026)  loss_rpn_box_reg: 0.0084 (0.0206)  time: 0.6895  data: 0.0100  max mem: 5187
Epoch: [6] Total time: 0:00:29 (0.7253 s / it)
Epoch: [7]  [ 0/40]  eta: 0:01:14  lr: 0.005000  loss: 0.2176 (0.2176)  loss_classifier: 0.0274 (0.0274)  loss_box_reg: 0.0646 (0.0646)  loss_mask: 0.1242 (0.1242)  loss_objectness: 0.0004 (0.0004)  loss_rpn_box_reg: 0.0010 (0.0010)  time: 1.8641  data: 1.1166  max mem: 5187
Epoch: [7]  [20/40]  eta: 0:00:15  lr: 0.005000  loss: 0.2853 (0.3064)  loss_classifier: 0.0428 (0.0448)  loss_box_reg: 0.0906 (0.0987)  loss_mask: 0.1394 (0.1508)  loss_objectness: 0.0015 (0.0026)  loss_rpn_box_reg: 0.0053 (0.0094)  time: 0.7074  data: 0.0086  max mem: 5187
Epoch: [7]  [39/40]  eta: 0:00:00  lr: 0.005000  loss: 0.2454 (0.2998)  loss_classifier: 0.0364 (0.0433)  loss_box_reg: 0.0794 (0.0938)  loss_mask: 0.1262 (0.1445)  loss_objectness: 0.0026 (0.0026)  loss_rpn_box_reg: 0.0093 (0.0156)  time: 0.6953  data: 0.0101  max mem: 5187
Epoch: [7] Total time: 0:00:29 (0.7319 s / it)
Epoch: [8]  [ 0/40]  eta: 0:01:10  lr: 0.005000  loss: 0.2965 (0.2965)  loss_classifier: 0.0588 (0.0588)  loss_box_reg: 0.0810 (0.0810)  loss_mask: 0.1369 (0.1369)  loss_objectness: 0.0017 (0.0017)  loss_rpn_box_reg: 0.0182 (0.0182)  time: 1.7511  data: 1.0845  max mem: 5187
Epoch: [8]  [20/40]  eta: 0:00:14  lr: 0.005000  loss: 0.2857 (0.2856)  loss_classifier: 0.0427 (0.0438)  loss_box_reg: 0.0904 (0.0924)  loss_mask: 0.1366 (0.1367)  loss_objectness: 0.0012 (0.0026)  loss_rpn_box_reg: 0.0063 (0.0101)  time: 0.6777  data: 0.0094  max mem: 5187
Epoch: [8]  [39/40]  eta: 0:00:00  lr: 0.005000  loss: 0.2927 (0.2919)  loss_classifier: 0.0424 (0.0440)  loss_box_reg: 0.0921 (0.0930)  loss_mask: 0.1367 (0.1400)  loss_objectness: 0.0015 (0.0026)  loss_rpn_box_reg: 0.0052 (0.0123)  time: 0.6956  data: 0.0114  max mem: 5187
Epoch: [8] Total time: 0:00:28 (0.7141 s / it)
Epoch: [9]  [ 0/40]  eta: 0:01:12  lr: 0.005000  loss: 0.3120 (0.3120)  loss_classifier: 0.0560 (0.0560)  loss_box_reg: 0.1078 (0.1078)  loss_mask: 0.1428 (0.1428)  loss_objectness: 0.0004 (0.0004)  loss_rpn_box_reg: 0.0049 (0.0049)  time: 1.8102  data: 1.0307  max mem: 5187
Epoch: [9]  [20/40]  eta: 0:00:15  lr: 0.005000  loss: 0.2736 (0.2901)  loss_classifier: 0.0404 (0.0424)  loss_box_reg: 0.0929 (0.0964)  loss_mask: 0.1306 (0.1382)  loss_objectness: 0.0013 (0.0022)  loss_rpn_box_reg: 0.0074 (0.0109)  time: 0.7114  data: 0.0108  max mem: 5187
Epoch: [9]  [39/40]  eta: 0:00:00  lr: 0.005000  loss: 0.2715 (0.2951)  loss_classifier: 0.0386 (0.0422)  loss_box_reg: 0.0886 (0.0954)  loss_mask: 0.1303 (0.1349)  loss_objectness: 0.0009 (0.0061)  loss_rpn_box_reg: 0.0068 (0.0165)  time: 0.7090  data: 0.0109  max mem: 5187
Epoch: [9] Total time: 0:00:29 (0.7387 s / it)
Epoch: [10]  [ 0/40]  eta: 0:01:12  lr: 0.005000  loss: 0.3104 (0.3104)  loss_classifier: 0.0554 (0.0554)  loss_box_reg: 0.1062 (0.1062)  loss_mask: 0.1262 (0.1262)  loss_objectness: 0.0098 (0.0098)  loss_rpn_box_reg: 0.0128 (0.0128)  time: 1.8117  data: 1.0588  max mem: 5187
Epoch: [10]  [20/40]  eta: 0:00:15  lr: 0.005000  loss: 0.2808 (0.3097)  loss_classifier: 0.0435 (0.0446)  loss_box_reg: 0.0940 (0.0983)  loss_mask: 0.1328 (0.1378)  loss_objectness: 0.0059 (0.0111)  loss_rpn_box_reg: 0.0123 (0.0179)  time: 0.7186  data: 0.0114  max mem: 5187
Epoch: [10]  [39/40]  eta: 0:00:00  lr: 0.005000  loss: 0.2724 (0.2910)  loss_classifier: 0.0395 (0.0425)  loss_box_reg: 0.0846 (0.0938)  loss_mask: 0.1331 (0.1350)  loss_objectness: 0.0023 (0.0072)  loss_rpn_box_reg: 0.0052 (0.0125)  time: 0.7079  data: 0.0142  max mem: 5187
Epoch: [10] Total time: 0:00:29 (0.7418 s / it)
Epoch: [11]  [ 0/40]  eta: 0:01:13  lr: 0.000500  loss: 0.2395 (0.2395)  loss_classifier: 0.0327 (0.0327)  loss_box_reg: 0.0650 (0.0650)  loss_mask: 0.1346 (0.1346)  loss_objectness: 0.0011 (0.0011)  loss_rpn_box_reg: 0.0061 (0.0061)  time: 1.8463  data: 1.0932  max mem: 5187
Epoch: [11]  [20/40]  eta: 0:00:15  lr: 0.000500  loss: 0.2365 (0.2678)  loss_classifier: 0.0343 (0.0399)  loss_box_reg: 0.0677 (0.0735)  loss_mask: 0.1183 (0.1244)  loss_objectness: 0.0031 (0.0035)  loss_rpn_box_reg: 0.0059 (0.0265)  time: 0.7085  data: 0.0098  max mem: 5187
Epoch: [11]  [39/40]  eta: 0:00:00  lr: 0.000500  loss: 0.2437 (0.2580)  loss_classifier: 0.0341 (0.0372)  loss_box_reg: 0.0702 (0.0718)  loss_mask: 0.1337 (0.1298)  loss_objectness: 0.0014 (0.0027)  loss_rpn_box_reg: 0.0041 (0.0164)  time: 0.6944  data: 0.0101  max mem: 5187
Epoch: [11] Total time: 0:00:29 (0.7308 s / it)
Epoch: [12]  [ 0/40]  eta: 0:01:11  lr: 0.000500  loss: 0.5819 (0.5819)  loss_classifier: 0.0460 (0.0460)  loss_box_reg: 0.0943 (0.0943)  loss_mask: 0.1268 (0.1268)  loss_objectness: 0.0030 (0.0030)  loss_rpn_box_reg: 0.3118 (0.3118)  time: 1.7838  data: 1.0627  max mem: 5187
Epoch: [12]  [20/40]  eta: 0:00:15  lr: 0.000500  loss: 0.2540 (0.2715)  loss_classifier: 0.0351 (0.0386)  loss_box_reg: 0.0742 (0.0770)  loss_mask: 0.1331 (0.1327)  loss_objectness: 0.0015 (0.0019)  loss_rpn_box_reg: 0.0032 (0.0212)  time: 0.7032  data: 0.0104  max mem: 5187
Epoch: [12]  [39/40]  eta: 0:00:00  lr: 0.000500  loss: 0.2361 (0.2541)  loss_classifier: 0.0333 (0.0362)  loss_box_reg: 0.0597 (0.0707)  loss_mask: 0.1323 (0.1296)  loss_objectness: 0.0015 (0.0024)  loss_rpn_box_reg: 0.0046 (0.0152)  time: 0.6968  data: 0.0113  max mem: 5187
Epoch: [12] Total time: 0:00:29 (0.7294 s / it)
Epoch: [13]  [ 0/40]  eta: 0:01:06  lr: 0.000500  loss: 0.2396 (0.2396)  loss_classifier: 0.0312 (0.0312)  loss_box_reg: 0.0615 (0.0615)  loss_mask: 0.1412 (0.1412)  loss_objectness: 0.0006 (0.0006)  loss_rpn_box_reg: 0.0052 (0.0052)  time: 1.6561  data: 0.9426  max mem: 5187
Epoch: [13]  [20/40]  eta: 0:00:15  lr: 0.000500  loss: 0.2219 (0.2478)  loss_classifier: 0.0378 (0.0365)  loss_box_reg: 0.0643 (0.0723)  loss_mask: 0.1238 (0.1311)  loss_objectness: 0.0020 (0.0023)  loss_rpn_box_reg: 0.0036 (0.0057)  time: 0.7082  data: 0.0092  max mem: 5187
Epoch: [13]  [39/40]  eta: 0:00:00  lr: 0.000500  loss: 0.2312 (0.2456)  loss_classifier: 0.0356 (0.0346)  loss_box_reg: 0.0626 (0.0676)  loss_mask: 0.1185 (0.1270)  loss_objectness: 0.0016 (0.0023)  loss_rpn_box_reg: 0.0038 (0.0141)  time: 0.6852  data: 0.0100  max mem: 5187
Epoch: [13] Total time: 0:00:28 (0.7218 s / it)
Epoch: [14]  [ 0/40]  eta: 0:01:12  lr: 0.000500  loss: 0.2305 (0.2305)  loss_classifier: 0.0379 (0.0379)  loss_box_reg: 0.0587 (0.0587)  loss_mask: 0.1267 (0.1267)  loss_objectness: 0.0014 (0.0014)  loss_rpn_box_reg: 0.0057 (0.0057)  time: 1.8212  data: 1.0400  max mem: 5187
Epoch: [14]  [20/40]  eta: 0:00:15  lr: 0.000500  loss: 0.2218 (0.2383)  loss_classifier: 0.0336 (0.0343)  loss_box_reg: 0.0677 (0.0694)  loss_mask: 0.1171 (0.1269)  loss_objectness: 0.0015 (0.0021)  loss_rpn_box_reg: 0.0049 (0.0056)  time: 0.7078  data: 0.0093  max mem: 5187
Epoch: [14]  [39/40]  eta: 0:00:00  lr: 0.000500  loss: 0.2060 (0.2471)  loss_classifier: 0.0303 (0.0342)  loss_box_reg: 0.0582 (0.0666)  loss_mask: 0.1211 (0.1273)  loss_objectness: 0.0007 (0.0022)  loss_rpn_box_reg: 0.0031 (0.0168)  time: 0.7043  data: 0.0112  max mem: 5187
Epoch: [14] Total time: 0:00:29 (0.7359 s / it)
Epoch: [15]  [ 0/40]  eta: 0:01:14  lr: 0.000500  loss: 0.3162 (0.3162)  loss_classifier: 0.0479 (0.0479)  loss_box_reg: 0.0911 (0.0911)  loss_mask: 0.1574 (0.1574)  loss_objectness: 0.0064 (0.0064)  loss_rpn_box_reg: 0.0134 (0.0134)  time: 1.8568  data: 1.1088  max mem: 5187
Epoch: [15]  [20/40]  eta: 0:00:15  lr: 0.000500  loss: 0.2390 (0.2558)  loss_classifier: 0.0329 (0.0353)  loss_box_reg: 0.0636 (0.0691)  loss_mask: 0.1268 (0.1302)  loss_objectness: 0.0018 (0.0021)  loss_rpn_box_reg: 0.0043 (0.0192)  time: 0.7131  data: 0.0112  max mem: 5187
Epoch: [15]  [39/40]  eta: 0:00:00  lr: 0.000500  loss: 0.2120 (0.2427)  loss_classifier: 0.0322 (0.0339)  loss_box_reg: 0.0645 (0.0665)  loss_mask: 0.1174 (0.1273)  loss_objectness: 0.0012 (0.0021)  loss_rpn_box_reg: 0.0031 (0.0129)  time: 0.6973  data: 0.0105  max mem: 5187
Epoch: [15] Total time: 0:00:29 (0.7352 s / it)
Epoch: [16]  [ 0/40]  eta: 0:01:12  lr: 0.000500  loss: 0.2203 (0.2203)  loss_classifier: 0.0322 (0.0322)  loss_box_reg: 0.0623 (0.0623)  loss_mask: 0.1183 (0.1183)  loss_objectness: 0.0009 (0.0009)  loss_rpn_box_reg: 0.0067 (0.0067)  time: 1.8014  data: 1.0053  max mem: 5187
Epoch: [16]  [20/40]  eta: 0:00:15  lr: 0.000500  loss: 0.2405 (0.2549)  loss_classifier: 0.0342 (0.0383)  loss_box_reg: 0.0646 (0.0683)  loss_mask: 0.1267 (0.1285)  loss_objectness: 0.0019 (0.0025)  loss_rpn_box_reg: 0.0035 (0.0174)  time: 0.7138  data: 0.0100  max mem: 5187
Epoch: [16]  [39/40]  eta: 0:00:00  lr: 0.000500  loss: 0.2160 (0.2362)  loss_classifier: 0.0294 (0.0346)  loss_box_reg: 0.0565 (0.0634)  loss_mask: 0.1233 (0.1246)  loss_objectness: 0.0012 (0.0020)  loss_rpn_box_reg: 0.0042 (0.0117)  time: 0.6955  data: 0.0100  max mem: 5187
Epoch: [16] Total time: 0:00:29 (0.7331 s / it)
Epoch: [17]  [ 0/40]  eta: 0:01:05  lr: 0.000500  loss: 0.1999 (0.1999)  loss_classifier: 0.0295 (0.0295)  loss_box_reg: 0.0586 (0.0586)  loss_mask: 0.1091 (0.1091)  loss_objectness: 0.0009 (0.0009)  loss_rpn_box_reg: 0.0018 (0.0018)  time: 1.6261  data: 0.8904  max mem: 5187
Epoch: [17]  [20/40]  eta: 0:00:14  lr: 0.000500  loss: 0.2253 (0.2499)  loss_classifier: 0.0322 (0.0334)  loss_box_reg: 0.0611 (0.0672)  loss_mask: 0.1288 (0.1248)  loss_objectness: 0.0010 (0.0014)  loss_rpn_box_reg: 0.0045 (0.0230)  time: 0.7046  data: 0.0112  max mem: 5187
Epoch: [17]  [39/40]  eta: 0:00:00  lr: 0.000500  loss: 0.2449 (0.2442)  loss_classifier: 0.0356 (0.0340)  loss_box_reg: 0.0641 (0.0678)  loss_mask: 0.1330 (0.1260)  loss_objectness: 0.0014 (0.0016)  loss_rpn_box_reg: 0.0043 (0.0148)  time: 0.7068  data: 0.0103  max mem: 5187
Epoch: [17] Total time: 0:00:29 (0.7300 s / it)
Epoch: [18]  [ 0/40]  eta: 0:01:18  lr: 0.000500  loss: 0.2989 (0.2989)  loss_classifier: 0.0509 (0.0509)  loss_box_reg: 0.0954 (0.0954)  loss_mask: 0.1424 (0.1424)  loss_objectness: 0.0016 (0.0016)  loss_rpn_box_reg: 0.0086 (0.0086)  time: 1.9511  data: 1.1586  max mem: 5187
Epoch: [18]  [20/40]  eta: 0:00:15  lr: 0.000500  loss: 0.2209 (0.2408)  loss_classifier: 0.0322 (0.0351)  loss_box_reg: 0.0661 (0.0686)  loss_mask: 0.1236 (0.1287)  loss_objectness: 0.0010 (0.0021)  loss_rpn_box_reg: 0.0029 (0.0063)  time: 0.7152  data: 0.0089  max mem: 5187
Epoch: [18]  [39/40]  eta: 0:00:00  lr: 0.000500  loss: 0.2201 (0.2362)  loss_classifier: 0.0330 (0.0337)  loss_box_reg: 0.0604 (0.0659)  loss_mask: 0.1192 (0.1238)  loss_objectness: 0.0011 (0.0020)  loss_rpn_box_reg: 0.0043 (0.0108)  time: 0.7051  data: 0.0107  max mem: 5187
Epoch: [18] Total time: 0:00:29 (0.7429 s / it)
Epoch: [19]  [ 0/40]  eta: 0:01:07  lr: 0.000500  loss: 0.2866 (0.2866)  loss_classifier: 0.0346 (0.0346)  loss_box_reg: 0.0807 (0.0807)  loss_mask: 0.1660 (0.1660)  loss_objectness: 0.0008 (0.0008)  loss_rpn_box_reg: 0.0046 (0.0046)  time: 1.6856  data: 0.9269  max mem: 5187
Epoch: [19]  [20/40]  eta: 0:00:15  lr: 0.000500  loss: 0.2472 (0.2487)  loss_classifier: 0.0291 (0.0334)  loss_box_reg: 0.0565 (0.0657)  loss_mask: 0.1231 (0.1257)  loss_objectness: 0.0011 (0.0019)  loss_rpn_box_reg: 0.0038 (0.0220)  time: 0.7276  data: 0.0130  max mem: 5187
Epoch: [19]  [39/40]  eta: 0:00:00  lr: 0.000500  loss: 0.2089 (0.2409)  loss_classifier: 0.0299 (0.0337)  loss_box_reg: 0.0639 (0.0661)  loss_mask: 0.1205 (0.1255)  loss_objectness: 0.0008 (0.0015)  loss_rpn_box_reg: 0.0034 (0.0142)  time: 0.6967  data: 0.0098  max mem: 5187
Epoch: [19] Total time: 0:00:29 (0.7388 s / it)
{% endraw %}

Calculate metrics

Visualise few samples and print the IOU metric for those samples

{% raw %}
from dolphins_recognition_challenge.instance_segmentation.model import show_prediction, iou_metric_example

for i in range(4):
    iou_test_image = iou_metric_example(model, data_loader_test.dataset[i], 0.5)
    img, _ = data_loader_test.dataset[i]
    print(f"IOU metric for the input image is: {iou_test_image}")
    show_prediction(model, img, width=820)
IOU metric for the input image is: 0.6163980828568573
IOU metric for the input image is: 0.6188050720160738
IOU metric for the input image is: 0.5790598755916212
IOU metric for the input image is: 0.5138045344933209
{% endraw %}

Calculate the mean IOU metric for the entire data set

{% raw %}
%%time

from dolphins_recognition_challenge.instance_segmentation.model import iou_metric, show_predictions_sorted_by_iou

mean_iou_testset, _ = iou_metric(model, data_loader_test.dataset)

print(f"Mean IOU metric for the test set is: {mean_iou_testset}")
Mean IOU metric for the test set is: 0.45871173589935726
CPU times: user 10.6 s, sys: 19.9 ms, total: 10.6 s
Wall time: 7.08 s
{% endraw %}

...

{% raw %}
show_predictions_sorted_by_iou(model, data_loader_test.dataset)
IOU metric: 0.23184597232650433
IOU metric: 0.24581407779447353
IOU metric: 0.24618816019374148
IOU metric: 0.2555991486030988
IOU metric: 0.3009551120616335
IOU metric: 0.30301062114673694
IOU metric: 0.31223090414023597
IOU metric: 0.32287201526036274
IOU metric: 0.35421019210181454
IOU metric: 0.3611191795266899
IOU metric: 0.3900684088378557
IOU metric: 0.3944734436506459
IOU metric: 0.39596665889815275
IOU metric: 0.4046655371894457
IOU metric: 0.40473742236010396
IOU metric: 0.4067293943715474
IOU metric: 0.4068489354652476
IOU metric: 0.40797115209198775
IOU metric: 0.4183618790635502
IOU metric: 0.4330426583394982
IOU metric: 0.43434213493463114
IOU metric: 0.4355279368169243
IOU metric: 0.4455225465549054
IOU metric: 0.48451056615999233
IOU metric: 0.5004015227947641
IOU metric: 0.512984890581837
IOU metric: 0.5138045344933209
IOU metric: 0.5445646280196771
IOU metric: 0.5735681293787354
IOU metric: 0.5762955944824969
IOU metric: 0.5790598755916212
IOU metric: 0.6039832798984675
IOU metric: 0.6163980828568573
IOU metric: 0.6188050720160738
IOU metric: 0.6271841589218914
IOU metric: 0.6621209775426122
IOU metric: 0.6829042616073147
IOU metric: 0.6868579776768257
IOU metric: 0.7942106563226543
{% endraw %}

Submit solution

Here we can see how to use the submit_model function. We must pass trained model, an alias that will be displayed on the leaderboard, name and email. Returns the path to the zipped file.

{% raw %}
from dolphins_recognition_challenge.submissions import submit_model

zip_fname = submit_model(model, alias="dolphin123", name="Name Surname", email="name.surname@gmail.com")
{% endraw %}

Here we can check what is in the zip file. The zip file contains the model and 2 csv files. The first CSV file contains the iou metrics for each image from the validation set, and the second file contains information about the competitor.

{% raw %}
!unzip -vl "{zip_fname}"
Archive:  submission-iou=0.45871-dolphin123-name.surname@gmail.com-2021-01-05T12:40:05.356274.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
    3358  Stored     3358   0% 2021-01-05 12:40 c2521ef8  submission-iou=0.45871-dolphin123-name.surname@gmail.com-2021-01-05T12:40:05.356274/metrics.csv
176247138  Stored 176247138   0% 2021-01-05 12:40 a3a38cbc  submission-iou=0.45871-dolphin123-name.surname@gmail.com-2021-01-05T12:40:05.356274/model.pt
      94  Stored       94   0% 2021-01-05 12:40 a9fa1065  submission-iou=0.45871-dolphin123-name.surname@gmail.com-2021-01-05T12:40:05.356274/info.csv
--------          -------  ---                            -------
176250590         176250590   0%                            3 files
{% endraw %}